perm filename QUAL.TEX[DEK,DEK] blob
sn#658751 filedate 1982-05-18 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00004 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 \input basic
C00003 00003 \ctrline{\bf SOLUTIONS TO ANALYSIS OF ALGORITHMS QUAL}
C00020 00004 \vfill\end
C00034 ENDMK
C⊗;
\input basic
\magnify{1300}
\font t=cmtt
\chcode↔=13 \def↔{\penalty999\ }
\def\yskip{\vskip 3pt plus 1pt minus 1pt}
\def\\{\,.\,.\,}
\def\display#1:#2:#3\par{\par\hangindent #1 \noindent
\hbox to #1{\hfill #2 \hskip .1em}\!#3 \par}
\ctrline{\bf SOLUTIONS TO ANALYSIS OF ALGORITHMS QUAL}
\vskip 10pt
\noindent{\bf Problem 1.}
\noindent {\sl Answer to part a.}\quad
If $x↓1\ldotsm x↓n$ is a permutation of $\{1,\ldotss,n\}$, let
$\=x↓1\ldotsm\=x↓{n-1}$ be the permutation of $\{1,\ldotss,n-1\}$ that
arises when the elements of $x↓1\ldotsm x↓{n-1}$ that exceed $x↓n$ are
reduced by 1. The permutation $x↓1\ldotsm x↓n$ leads to the root $(k\\k+1)$
if and only if: (1) $x↓n<k$ and $\=x↓1\ldotsm\=x↓{n-1}$ leads to the
root $(k-1\\k)$. (2) $x↓n=k$ and $\=x↓1\ldotsm\=x↓{n-1}$ leads to the
root $(k-1\\k)$ and either $k-1<n-k$ or ($k-1=n-k$ and a random coin flip
comes up heads). (3)↔$x↓n=k+1$ and $\=x↓1\ldotsm\=x↓{n-1}$ leads to the root
$(k\\k+1)$ and either $k>n-1-k$ or ($k=n-1-k$ and a random coin flip comes
up tails). (4)↔$x↓n>k+1$ and $\=x↓1\ldotsm\=x↓{n-1}$ leads to the root
$(k\\k+1)$. Therefore we find, for $1≤k<n$ and $n>2$,
$$\twoline{P↓{nk}=
P↓{(n-1)(k-1)}\biglp k-1+(n+1>2k)+\textstyle{1\over2}(n+1=2k)\bigrp}{3pt}{\null+
P↓{(n-1)k}\biglp n-k-1+(n+1<2k)+\textstyle{1\over2}(n-1=2k)\bigrp.}$$
\yskip\noindent {\sl Answer to part b.}\quad
It is easy to see that $P↓{nk}=P↓{n(n-k)}$, so $Q↓{nk}=Q↓{n(n-k)}$. Thus it
suffices to consider $k≤n-k$. If $k<n-k-1$, the above recurrence reads
$$\twoline{{Q↓{nk}k!(n-k)!\over2(n-k)}=
{Q↓{(n-1)(k-l)}(k-1)!(n-k)!\over2(n-k)}(k-1+1)}{3pt}{\null+
{Q↓{(n-1)k}k!(n-1-k)!\over2(n-k-1)}(n-k-1),}$$
i.e., $Q↓{nk}=Q↓{(n-1)(k-1)}+Q↓{(n-1)k}$. If $k=n-k$, it reads
$$\twoline{{Q↓{nk}k!\,k!\over2k}={Q↓{(n-1)(k-1)}(k-1)!\,k!\over2k}(k-1+1)}{3pt}{\null+
{Q↓{(n-1)k}k!(k-1)!\over2k}(k-1+1),}$$
so Pascal's relation holds again. But if $k=n-k-1$, we have
$${Q↓{nk}k!(k+1)!\over2(k+1)}={Q↓{(n-1)(k-1)}(k-1)!(k+1)!\over2(k+1)}(k-1+1)+
{Q↓{(n-1)k}k!\,k!\over2k}(k+\textstyle{1\over2}),$$
hence $Q↓{nk}=Q↓{(n-1)(k-1)}+Q↓{(n-1)k}+{1\over n-1}Q↓{(n-1)k}$.
By symmetry, if $k=n-k+1$ we have
$Q↓{nk}=Q↓{(n-1)(k-1)}+Q↓{(n-1)k}+{1\over n-1}Q↓{(n-1)(k-1)}$. Pascal's
relation therefore holds except when $(n,k)=(2,1)$ or when $|n-2k|=1$.
\yskip\noindent {\sl Answer to part c.}\quad
It is convenient to tip the triangle sideways and to associate $Q↓{nk}$ with
the point $(k,n-k)$ in a grid. We can interpret $Q↓{nk}$ as 4 times the sum,
over all paths from $(1,1)$ to $(k,n-k)$, of the products of the weights
of the edges, where edges run from $(i,j)$ to $(i+1,j)$ and to $(i,j+1)$;
the weight of such an edge is↔1, except when $i=j$ it is $1+1/(2j)$.
Now $a↓k$ is 4 times the sum over paths from $(1,1)$ to $(k,k)$, so we can break the
sum into various sub-sums depending on the greatest diagonal point $(j,j)$
on the path, for $j<k$. The $j$th sub-sum is $a↓j$ times $1+1/(2j)$ times
the number of subpaths from $(j,j)$ to $(k,k)$ that do not touch the diagonal,
since all edge weights but the first are 1 on such subpaths. There are
$2c↓{k-j}$ such subpaths.
\yskip\noindent {\sl Answer to part d.}\quad
Since $kf↓k=\sum↓j(2j+1)f↓jc↓{k-j}+4(k=1)$, we have $zF↑\prime(z)=4z+C(z)
\biglp2zF↑\prime(z)+F(z)\bigrp$, and this simplifies to
$$z\sqrt{1-4z}\,F↑\prime(z)=4z+C(z)F(z).$$
Following the hint, which follows from the general method of finding an
integrating factor for first-order differential equations, we find
$$\eqalign{\biglp B(z)F(z)\bigrp↑\prime
⊗=B(z)F↑\prime(z)-F(z)/\sqrt{1-4z}\cr
\noalign{\vskip3pt}
⊗={B(z)\over z\sqrt{1-4z}}\biglp z\sqrt{1-4z}\,F↑\prime(z)-C(z)F(z)\bigrp\cr
\noalign{\vskip3pt}
⊗=4B(z)/\sqrt{1-4z}=2/\sqrt{1-4z}+2.\cr}$$
Thus $B(z)F(z)=2C(z)+2z$, and in a few more steps we find $a↓n=2n(c↓n+c↓{n+1})
=n{2n\choose n}\biglp{1\over2n-1}+{2\over n+1}\bigrp$, for $n≥1$.
\yskip\noindent {\sl Answer to part e.}\quad
$(1-w-z)Q(z)=\sum w↑kz↑{n-k}(Q↓{nk}-Q↓{(n-1)k}-Q↓{(n-1)(k-1)})=
4wz+{1\over2}(w+z)(f↓1wz+f↓2w↑2z↑2+f↓3w↑3z↑3+\cdotss)$, hence we have
$$Q(w,z)=(1-w-z)↑{-1}\biglp4wz+\textstyle{1\over2}(w↑{-1}-w+z↑{-1}-z
-(w↑{-1}+w+z↑{-1}+z)\sqrt{1-4wz}\,)\bigrp.$$
\yskip\noindent {\sl Answer to part f.}\quad
The coefficient of $w↑nz↑{n+r}$ in $g(wz)h(w,z)$ is the coefficient of
$x↑n$ in $g(x)h↓r(x)$, if $h↓r(x)=\sum a↓{m(m+r)}x↑m$ and $h(w,z)=
\sum a↓{mn}w↑mz↑n$, since multiplication by $g(wz)$ affects only the coefficients
having the same exponent offset. Hence the coefficient of $w↑nz↑{n+r}$ in
$\sqrt{1-4wz}/(1-w-z)$ is the coefficient of $x↑n$ in $\sqrt{1-4x}\sum
{2n+r\choose n}x↑n=\biglp C(x)/x\bigrp↑r=C(x)↑r\biglp B(x)-C(x)\bigrp x↑{-r}/
\sqrt{1-4x}=\biglp C(x)/x\bigrp↑{r-1}/\sqrt{1-4x}-
x\biglp C(x)/x\bigrp↑{r+1}/\sqrt{1-4x}=\sum\biglp{2n+r-1\choose n}-
{2n+r-1\choose n-1}\bigrp x↑n$.
\yskip\noindent {\sl Answer to part g.}\quad
For $r>0$, the coefficient of $w↑nz↑{n+r}$ in $Q(w,z)$ can now be computed by
considering the various terms in part (e). Let $b↓r={2n+r\choose n}$. Then
$Q↓{(2n+r)n}=4(b↓{r-1}-b↓{r-2})+{1\over2}\biglp{2n+r+1\choose n+1}-b↓r+b↓{r-1}
+b↓{r+1}-b↓{r-1}-{2n+r\choose n+1}+\linebreak
b↓r-b↓{r-1}+b↓{r-2}+b↓r-2b↓{r-1}+b↓{r-2}
-b↓r+b↓{r+1}-b↓r-b↓{r-2}+b↓{r-1}-b↓{r-2}\bigrp=b↓{r+1}+3b↓{r-1}-4b↓{r-2}$.
Multiply by ${1\over2}(n+r-1)!\,n!/(2n+r)!$ to get $p↓{(2n+r)n}$. A
dif-\linebreak
ferent formula applies when $r=0$, because of the $w↑{-1}$ and $z↑{-1}$ terms.
\yskip\noindent {\sl Final comment: A note to J. H. Quick.}\quad
``When $x=k/n<{1\over2}$ we have $p↓{nk}\approx{1\over2}\biglp(1-x)↑{-2}-1\bigrp
+2x$, hence ILBT's do a reasonably good job of partitioning. The distribution
of permutations in the left and right subtrees is not random, and we could
perhaps pursue the analysis to find the average path length of ILBT's.
But really, Mr.\ Quick, your algorithm still does not deserve to be
implemented. The average path length will be somewhere between $2n\ln n$ and
$(1/\!\ln2)\,n\ln n$; the extra time your method takes at each node slows
the program down so much that the slightly smaller path length is pointless.
It was clear from the start that ILBT's would lose out to other methods in
all respects (space, time, ease of implementation, and so on).
The only saving feature was that your algorithms lead to instructive
mathematics that we might someday be able to apply to the analysis of a
really useful method. You undoubtedly knew that too, so thanks for
the ideas.''
\vskip10pt
\noindent{\bf Problem 2.}
The problem is answered completely by showing that (1) $Q↓1$ is
linear, and (2) $Q↓2$ is $\Nscr\Pscr$-complete.
(1) Construct a digraph whose nodes are the variables and with an arc
$p→q$ if $p⊃q$ is a given 1-Horn formula. Do the following steps, each of
which is linear in the number of arcs.
\display 25pt: i): Find strong components.
\display 25pt: ii): Replace each strong component by a single node, with an
arc $n↓1→n↓2$ if $n↓1≠n↓2$, and some member of the strong component
represented by $n↓1$ has an arc to some member of the strong component
represented by $n↓2$.
\display 25pt: iii): In the resulting dag, find all roots, that is,
nodes with no incoming arc.
\display 25pt: iv): For each root, select one node from the corresponding
strong component.
The result is evidently a solution, and it is minimal because without one
node from a root strong component, there is no way any node in that strong
component can be implied.
\yskip
(2) We reduce node cover to $Q↓2$. Let $G=(V,E)$ be a graph with $n$ nodes
and $m$ edges. We construct a set of 2-Horn clauses whose variables correspond
to the nodes and edges, plus certain auxiliaries. The idea behind the
reduction is that a minimal set of variables may be assumed to
consist only of node
variables, and these nodes must form a node cover if they are to be a
solution to the Horn clause problem. We arrange that each edge is implied
by the nodes at its ends, and once all the edges are true, all the nodes
are implied.
Formally, let the nodes be $v↓1,v↓2,\ldotss,v↓n$ and the edges
$e↓1,e↓2,\ldotss,e↓m$. The variables are:
$$\eqalign{⊗v↓i\hbox{ for } 1≤i≤n\cr
⊗e↓j\hbox{ for }1≤j≤m\cr
⊗x↓{ij}\hbox{ for }1≤i≤n,\; 2≤j<m\,.\cr}$$
The clauses are:
\display 25pt: 1.: If edge $e↓j$ is $(v↓p,v↓q)$, then there are clauses
$v↓p⊃e↓j$ and $v↓q⊃e↓j$.
\display 25pt: 2.: For each $i$, we have the clauses
$$\eqalign{⊗e↓1∧e↓2⊃x↓{i2}\cr
⊗x↓{ij}∧e↓{j+1}⊃x↓{i,j+1}\hbox{ for } 2≤j<m-1\cr
⊗x↓{i,m-1}∧e↓m⊃v↓i\,.\cr}$$
The structure of the implications is thus
$$\vcenter{\halign{\lft{#}\qquad⊗$\ctr{#}$\quad⊗$\ctr{#}$\cr
⊗v↓1\,v↓2\ldotsm v↓i\ldotsm v↓n\cr
\noalign{\vskip 5pt}
⊗x↓{i,m-1}\cr
\noalign{\vskip 5pt}
typical column⊗\vdots\cr
\noalign{\vskip 5pt}
⊗x↓{i3}\cr
\noalign{\vskip 5pt}
⊗x↓{i2}\cr
\noalign{\vskip 5pt}
⊗e↓1⊗e↓2\hskip10pt\ldots\hskip40pt e↓m\cr}}$$
\vskip40pt
Suppose some set of variables $S$ is a solution. We modify $S$ as follows to
get a solution that is no larger and has only $v↓i$'s as members.
\display 25pt: i): If $x↓{ij}$ is in $S$, replace it by $v↓i$. The result
is still a solution, because it doesn't change the set of $v$'s that are
implied, which means all $v$'s are still implied. Therefore all $e$'s are
still implied, and thus so are all $x$'s. Note that the fact that $x↓{ij}$
helps imply no other $v$ but $v↓i$ is important here.
\display 25pt: ii): If $e↓j$ is in $S$, replace it by one of its endpoints,
say $v↓i$. Then since $v↓i⊃e↓j$, all $e$'s are still implied, therefore so
are all $x$'s and all $v$'s.
We conclude that if there is a solution to the Horn clause problem of size $k$,
then there is one consisting only of $v$'s. But this set of nodes must be a
node cover, or else some $e$ variable is not implied. Conversely, a node
cover of size $k$ is a solution to the Horn clause problem. Thus $G$ has a
node cover of size $k$ if and only if the constructed problem is in $Q↓2$.
\vfill\eject\end
\vfill\end
\noindent{\bf Problem 1. More on LBTs.}
This is a sequel to a problem that was on the recent CS255 midterm, so
we begin by quoting from it:
\hsize 6truein
\def\|{\hangindent .5truein after 0}
\yskip
\|A student named J. H. Quick woke up one morning with an idea for a new kind of
binary search tree. He had learned about the advantages of ``late binding''
in his studies of computer science, and he thought: ``{\sl Why should I use
the first key to decide how the rest of the tree will be partitioned? I could
do better by postponing that decision and letting further keys influence
what happens.}'' Running to his interactive workstation, he hastily prepared
a file containing a description of his new data structure, which he
chose to call Late Binding Trees (LBTs); and then he ate breakfast.
\|Unfortunately there is not room here to describe the subsequent events in
Quick's life; the story about his fateful encounters with the Chuvstvenni
sisters in Gstaad, who vowed to stop at nothing until they had learned his
secret, will probably never be told. Let us rather turn our attention to the
specifics of LBTs, suppressing the details of how this information was
learned.
\|There are two types of nodes: branch nodes and leaves. A {\sl branch node\/}
contains two keys $a$ and $b$, where $a<b$, and it also contains two links
$l$ and $r$ that point to subtrees. All keys in the $l$ subtree are $≤a$,
and all keys in the $r$ subtree are $≥b$. Such a node can be represented by
the notation `$(a\\b)$', having its subtrees drawn below. A {\sl leaf node\/}
contains a full record, including a key $a$; such a node can be represented by
`$[a]$'.
\|LBTs are never empty; they start out with a single (leaf) node. The left
subtree of a branch node $(a\\b)$ contains the leaf node $[a]$, and the
right subtree contains $[b]$. If we want to insert a new record with
key $x$ into a given LBT, we proceed as follows, assuming that $x$ is
different from all keys already in the tree:
\yskip\itemitem{(1)} If the LBT is $[a]$, and if $a<x$, change the LBT to
$(a\\x)$, with left subtree $[a]$ and right subtree $[x]$. A similar construction
with $a$ and $x$ interchanged is used if $x<a$.
\yskip\itemitem{(2)} If the LBT has root $(a\\b)$ and if $x<a$, insert the new
record into the left subtree, using the same method recursively.
\yskip\itemitem{(3)} If the LBT has root $(a\\b)$ and if $x>b$, insert the new
record into the right subtree, using the same method recursively.
\yskip\itemitem{(4)} If the LBT has root $(a\\b)$ and if $a<x<b$, flip a truly
random coin. If it comes up heads, change the root to $(x\\b)$ and insert the
new record in the left subtree; otherwise change the root to $(a\\x)$ and
insert the new record in the right subtree.
\yskip\|\noindent The idea is therefore to keep track of a range of possible
splitting keys in the root of the tree, instead of deciding prematurely
on a particular one.
\yskip\yskip\ctrline{$\null\ast{\quad\over}\ast{\quad\over}\ast$}
\yskip\yskip\hsize 6.5truein\noindent
Well, the result of that midterm problem was to analyze LBTs and to show that
their average path length is about the same as simple binary search trees.
But shortly after the midterm was graded, our sources discovered that
Quick was undaunted by that analysis. According to reliable reports, he
recently decided to try salvaging his
idea by including new information in each node.
The nodes in Quick's new data
structures, which he calls ILBTs (Improved Late Binding Trees), contain
a size field that tells how many leaves are in the subtree rooted at that
node. Step (4) is now replaced by a new step: In case the root is being
``split,'' the insertion continues in whichever subtree is currently
smaller. (If the subtree sizes are equal, a random decision is made as before.)
The purpose of this problem is to carry out a ``top level'' analysis of
Quick's new algorithm. Let $p↓{nk}$ be the probability that the root is
$(k\\k+1)$ after inserting a random permutation of $\{1,\ldotss,n\}$.
(We assume that all permutations of the $x$'s are equally likely; first
$x↓1$ is made into an ILBT by itself, then $x↓2$ through $x↓n$ are
inserted one by one.) Let $P↓{nk}=n!\,p↓{nk}$. Then it can be verified that
we have the following values of $P↓{nk}$ for $1≤k<n$ and $1≤n≤6$:
$$\baselineskip15pt
\vbox{\halign{$#$:⊗\quad\hfil#⊗\quad\hfil#⊗\quad\hfil#⊗\quad\hfil#⊗\quad\hfil#\cr
n=2⊗2\cr
n=3⊗3⊗3\cr
n=4⊗6⊗12⊗6\cr
n=5⊗18⊗42⊗42⊗18\cr
n=6⊗72⊗162⊗252⊗162⊗72\cr}}$$
\yskip\noindent{\bf Part a.}\quad
Find a recurrence relation that defines the numbers $P↓{nk}$.
\yskip\noindent{\bf Part b.}\quad
Let $Q↓{nk}=2P↓{nk}\max(k,n-k)/\biglp k!(n-k)!\bigrp$,
so that we have the following triangle:
$$\baselineskip15pt
\vbox{\halign{$#$:⊗\quad\hfil#⊗\quad\hfil#⊗\quad\hfil#⊗\quad\hfil#⊗\quad\hfil#\cr
n=2⊗4\cr
n=3⊗6⊗6\cr
n=4⊗6⊗12⊗6\cr
n=5⊗6⊗21⊗21⊗6\cr
n=6⊗6⊗27⊗42⊗27⊗6\cr}}$$
Show that for most values of $n$ and $k$ the numbers $Q↓{nk}$ satisfy the
same recurrence as Pascal's triangle, i.e., $Q↓{nk}=Q↓{(n-1)k}+Q↓{(n-1)(k-1)}$.
Find all the exceptions, and state the recurrence obeyed at the exceptional
points.
\yskip\noindent{\bf Part c.}\quad
Let $a↓k=Q↓{(2k)k}$. Prove that for $k>1$,
$$a↓k=\sum↓{1≤j<k}{2j+1\over j}\,a↓jc↓{k-j},$$
where $c↓n$ is the number of binary trees with $n$ external nodes.
\yskip\noindent{\bf Part d.}\quad
Let $B(z)={1\over2}(1+\sqrt{1-4z})$ and $C(z)={1\over2}(1-\sqrt{1-4z})$,
so that $B(z)+C(z)=1$, $B(z)-C(z)=\sqrt{1-4z}$, $B(z)C(z)=z$, and
$C(z)↑2=C(z)-z$; recall that $C(z)$ is the generating function $c↓1z+c↓2z↑2+
c↓3z↑3+\cdots$ for binary trees. Let $f↓k=a↓k/k$, and set up the
generating function $F(z)=f↓1z+f↓2z↑2+\cdotss$. Convert the recurrence in
part (c) to a differential equation for $F$, and solve this equation to
obtain a ``closed form'' for $a↓k$. [{\sl Possible hint:\/} Show that the
derivative of $B(z)F(z)$ has a simple form.]
\yskip\noindent{\bf Part e.}\quad
Apply the recurrence of part (b) to the generating function $Q(w,z)=
\sum↓{k,n}Q↓{nk}w↑kz↑{n-k}$, and use the values of $a↓k$ found in part (d)
to obtain a formula for $Q(w,z)$ as an explicit function of $w$ and $z$.
\yskip\noindent{\bf Part f.}\quad
Find a ``simple'' expression for the coefficient of $w↑nz↑{n+r}$ in the
power series for $\sqrt{1-4wz}/(1-w-z)$, when $r≥0$. [{\sl Hint:\/} Consider
the problem for fixed $r$ and variable $n$. You may wish to use the identity
$C(z)↑s/\sqrt{1-4z}=\sum↓n{2n+s\choose n}z↑{n+s}$ and the facts about
$B(z)$ and $C(z)$ that are stated in part (d).]
\yskip\noindent{\bf Part g.}\quad
Show that, therefore,
$$p↓{nk}={1\over2}\left({k+1\over n-k}-{k\over n-k+1}\right)
-{1\over2n}+{2k\over n(n-1)}\qquad\hbox{for $1≤k<{1\over2}n$.}$$
Note: Do NOT simply take this formula or an equivalent one and prove it
by induction. You should present a scenario that explains how you could
have discovered this solution by yourself in a systematic manner without
lucky guesses.
\yskip
\noindent{\bf Problem 2.}
For the purpose of this problem, a {\sl $k$-Horn formula\/} is an expression
$$p↓1∧p↓2∧\ldotss ∧p↓i⊃p\,,$$
where $i≤k$. A {\sl Horn formula\/} is a $k$-Horn
formula for some $k$. Let $Q↓k$ be the set of $(n,L)$, where $n$ is an
integer and $L$ a list of $k$-Horn formulas, such that among the variables
mentioned in $L$ is a set of $n$ that logically imply all mentioned variables
when the Horn formulas in $L$ are applied (treating $∧$ as logical ``and''
and $⊃$ as ``implies,'' of course).
For example,
$(2,\{p∧q⊃r,\hskip10pt q∧r∧s⊃t,\hskip10pt s⊃p\})$
is in $Q↓3$, because $q$ and $s$
together imply all of $p$, $q$, $r$, $s$, and $t$. It is not in $Q↓2$ because
${q∧r∧s⊃t}$ is not a 2-Horn formula.
Let $Q=Q↓1∪Q↓2∪\ldotss$, that is, the set of $(n,L)$ where $L$ is a list
of arbitrary Horn formulas with a set of $n$ variables that imply all the
variables.
\yskip
a) Show that $Q$ is $\Nscr\Pscr$-complete.
\yskip
b) Investigate the complexity of the $Q↓i$'s. For what values of $i$, if
any, is $Q↓i$ {}
$\Nscr\Pscr$-complete? For what values of $i$ is $Q↓i$ recognizable in
polynomial time? If $Q↓i$ is polynomial, give the best time bound you can for
its recognition. Justify all answers.
\vfill\eject
\end